Key Diagonal Blocks of the Fisher Information Matrix on Neural Manifold of Full-Parametrised Multilayer Perceptrons
نویسنده
چکیده
Abstract: It’s well known the natural gradient learning (NGL) ([1]) may avoid global optima or phenomena of plateau in the training process since it takes into consideration the intrinsic geometric structure of the parameter space. But, natural gradient ([1]) is itself induced by Fisher information matrix (FIM) ([2]) defined on the 1-form tangent space ([3]), therefore calculation of relevant FIM is key to the realization of NGL. This paper gives explicit derivation and compact matrix representation of the diagonal blocks, and their inverses as well, of the FIM based Riemannian metric on neural manifold of full-parametrised multilayer perceptrons (MLP), thus extending and complementing the results partially given in [1] and [3].
منابع مشابه
Singularities Affect Dynamics of Learning in Neuromanifolds
The parameter spaces of hierarchical systems such as multilayer perceptrons include singularities due to the symmetry and degeneration of hidden units. A parameter space forms a geometrical manifold, called the neuromanifold in the case of neural networks. Such a model is identified with a statistical model, and a Riemannian metric is given by the Fisher information matrix. However, the matrix ...
متن کاملAdaptive Method of Realizing
The natural gradient learning method is known to have ideal performances for on-line training of multilayer perceptrons. It avoids plateaus which give rise to slow convergence of the backpropagation method. It is Fisher eecient whereas the conventional method is not. However, for implementing the method, it is necessary to calculate the Fisher information matrix and its inverse, which is practi...
متن کاملFukumizu : Statistical Active Learning in Multilayer Perceptron 3
|This paper proposes new methods of generating input locations actively in gathering training data, aiming at solving problems special to multilayer perceptrons. One of the problems is that the optimum input locations which are calculated deterministically sometimes result in badly-distributed data and cause local minima in back-propagation training. Two probabilistic active learning methods, w...
متن کاملActive Learning in Multilayer Perceptrons
We propose an active learning method with hidden-unit reduction, which is devised specially for multilayer perceptrons (MLP). First, we review our active learning method, and point out that many Fisher-information-based methods applied to MLP have a critical problem: the information matrix may be singular. To solve this problem, we derive the singularity condition of an information matrix, and ...
متن کاملConsistent estimation of the architecture of multilayer perceptrons
We consider regression models involving multilayer perceptrons (MLP) with one hidden layer and a Gaussian noise. The estimation of the parameters of the MLP can be done by maximizing the likelihood of the model. In this framework, it is difficult to determine the true number of hidden units using an information criterion, like the Bayesian information criteria (BIC), because the information mat...
متن کامل